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Abstract 

Over the past 1 5 years, the biomedical research community has increased its efforts to produce ontologies encoding 
biomedical knowledge, and to provide the corresponding infrastructure to maintain them. As ontologies are 
becoming a central part of biological and biomedical research, a communication channel to publish frequent updates 
and latest developments on them would be an advantage. 

Here, we introduce the JBMS thematic series on Biomedical Ontologies. The aim of the series is to disseminate the 
latest developments in research on biomedical ontologies and provide a venue for publishing newly developed 
ontologies, updates to existing ontologies as well as methodological advances, and selected contributions from 
conferences and workshops. We aim to give this thematic series a central role in the exploration of ongoing research 
in biomedical ontologies and intend to work closely together with the research community towards this aim. 
Researchers and working groups are encouraged to provide feedback on novel developments and special topics to 
be integrated into the existing publication cycles. 



Introduction 

This editorial will explore the expectations linked to 
a growing infrastructure around biomedical ontolo- 
gies. Since they become an integral part of bio- 
logical and biomedical research for the annotation 
of data - its integration, analysis, and visualization 
[1] - the demand for a place arises in which the scientific 
community can be made aware of new ontologies, major 
updates to existing ontologies, development and updates 
to ontology-based tools, and the discussion of ontology- 
based methods. The JBMS thematic series on 'Biomedical 
Ontologies', and the annual JBMS Ontology Issue, will 
fill these gaps and establish a hub of information about 
biomedical ontologies and their scientific applications. 

The role of ontologies in biological and biomedical 
research has steadily increased in conjunction with the 
increase in quality and quantity of data that is being col- 
lected in all areas of biology. Not only is the number 
of ontologies increasing, their size growing, their rele- 
vance in biomedical research rising and they penetrate 
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more areas of biology and biomedicine; ontologies have 
also begun to play a key part in the interpretation of 
the biomedical data as well as inspire the development 
of new tools for end users and new analysis methods for 
biomedical scientists. As a result, data integration and 
interoperability has become a relevant cost factor in the 
execution of big data projects and has been acknowledged 
by national and international projects, for example by the 
Elixir initiative, which aims to establish a biomedical IT 
infrastructure across Europe [2]. The development and 
application of ontologies will be an integral part of such 
an infrastructure for the main reason that data interoper- 
ability requires tools to explicitly describe the semantics 
of terms used to characterize the features of data, and 
ontologies are widely used to fill this role. 

Which 'ontology' did you mean? 

There has been considerable debate in the ontology 
research community as to what constitutes an ontology 
in biology [3-5] and what properties an ontology should 
have. Traditional axes of classification for ontologies 
include the expressivity of the language used to develop 
and distribute the ontologies, the applications for which 
the ontologies are intended (i.e. who uses the ontology and 
how) and the domain covered by the ontology. Arguments 
pertain 
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(a) To the degree of formality of the language used to 
express the information in an ontology, i.e., whether 
a formal language such as the Web Ontology 
Language (OWL) [6] is used or a graph-based 
representation without explicit formal semantics, 

(b) To the complexity of the ontology description, i.e., 
whether rich axioms and relations are used or 
whether a taxonomy, accompanied with textual 
definitions of classes in an ontology, is sufficient, 

(c) To the interpretation of what constitutes a "class" or 
"relation" in an ontology, i.e., whether a class in an 
ontology refers to something in the world or to a 
mental construct, and 

(d) To the orthogonality of the content, i.e., what 
content has been incorporated from other ontologies 
and for which purposes. 

Depending on the intended applications, artifacts called 
"ontologies" are developed with any combination of these 
properties. 

In the JBMS thematic series on Biomedical Ontolo- 
gies, we employ a broad interpretation of "ontology" and 
include artifacts that primarily provide vocabularies for 
the purpose of data annotation as well as formal theo- 
ries that provide a rich representation of certain aspects of 
biomedicine. To annotate data within a database, a taxon- 
omy of classes with labels and textual definitions is often 
sufficient, while more expressive formal constructs would 
be required if the ontology is developed to verify data 
integrity. 

Representing ontologies 

The annotation of research data using an ontology enables 
integration of data both within a database and across 
multiple databases [1]. Ontologies provide a controlled 
set of classes together with an explicit (formal or infor- 
mal) representation of their meaning, a hierarchy between 
these classes and complex axiom patterns ("relations") 
[7] between the classes, and ontologies facilitate data 
integration when shared across multiple databases. The 
taxonomic relations allow integration through general or 
specific aspects even if exact matches between data items 
can not be identified; and axioms between classes serve as 
complex relations that facilitate further data integration. 

Today, most biomedical ontologies are developed in 
shared formal languages, either the OBO Flatfile Format 
[8] or the Web Ontology Language (OWL) [6]. Both lan- 
guages are tightly coupled and thus allow translations 
between them [9,10] so that the OBO Flatfile Format can 
now be considered to be a fragment of OWL [8]. 

The expressivity of a biomedical ontology is determined 
by the particular subset of OWL that is being used to 
formulate the ontologies, and serves as a major distin- 
guishing factor. It characterizes the knowledge that can 



be expressed (such as whether the ontology may contain 
contradictions) and determines the complexity of gen- 
eral tasks such as querying the ontology and categorizing 
data with the ontology. OWL 2 comprises three emerg- 
ing profiles (OWL EL, OWL QL and OWL RL) apart 
from OWL DL [11]. The OWL EL profile forms a subset 
which (a) allows to specify a taxonomy between classes 
(i.e., to state that one class is the subclass of another), (b) 
existential restrictions (i.e., to state that instances of one 
class must stand in a relation to some instance of another 
class), and (c) disjointness of classes (i.e., to state that two 
classes cannot share any instances), and has been found 
useful for a significant number of biomedical ontologies 
[12-15]. 

Domains of ontologies and their applications 

Ontologies are of particular importance in domains in 
which large volumes of data are being generated, and the 
emergence of high-throughput technologies has increased 
the importance of ontologies in some domains. In the 
1990s, research on discovering gene functions in diverse 
organisms required a means to standardize gene functions 
for comparison within and across multiple organisms: 
this need induced the development of the Gene Ontol- 
ogy (GO), which turned into one of the most important 
resources in genomics research [16]. In a similar way, the 
Sequence Ontology (SO) [17] emerged as a response to 
the availability of more and more sequencing data, and to 
provide compatibility between different data formats for 
biological sequences and their features. 

Different anatomy ontologies specify the organismal 
components for multiple species, and - on a smaller scale 
of granularity - the developmental relations and features 
of cell types are characterized by the Celltype Ontology 
[18]. Phenotype ontologies are also available for multi- 
ple species and are widely used for the annotation of 
the abnormalities observed in mutagenesis experiments 
[19-21] as well as for the characterization of diseases and 
drug effects [22]. 

Further domains covered comprise chemical entities to 
annotate drugs and theirs biological activities [23], struc- 
tures, and pharmaceutical applications [23,24] for data 
interoperability [25], and ontologies for experimental set- 
tings, e.g., the Bio Assay Ontology [26], the Experimental 
Factor Ontology [27], the eagle-i ontology [28] and the 
Ontology of Biomedical Investigations [29], capture the 
biomedical metadata to characterize experiments. Simi- 
larly, ontologies for environmental conditions denote data 
samples and their surroundings upon their encounter 
[27,30]. Ontologies are also being used to annotate and 
classify journal articles [31,32], pathways [33], and specific 
biological entities [34] . 

Ontologies, together with their annotations, are exten- 
sively used in the analysis of biomedical data, for example 
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in the form of Gene Set Enrichment Analysis (GSEA) [35] 
for the interpretation of gene expression datasets. GSEA 
makes use of the structure of the Gene Ontology to iden- 
tify statistically over- or under-represented classes based 
on gene expression observed in two biological states. Sim- 
ilar methods are also applied to other ontologies such as 
the Human Disease Ontology [36], the Neuro Behavior 
Ontology [37], or even the full set of ontologies contained 
in BioPortal [38]. 

Another analysis method relying on ontologies is to 
compare data items and identify meaningful biologi- 
cal relations between them based on semantic similarity 
[39]. This approach has been applied to identify protein- 
protein interactions [40], classify chemicals [41], suggest 
candidate genes involved in diseases [42,43] and repur- 
pose drugs [44,45]. When applying semantic similarity to 
compare two data items, the choice of ontology deter- 
mines the kind of similarity that is revealed: using GO 
will provide functional similarity, chemical entities from 
ChEBI will provide chemical structure similarity, and 
using phenotype ontologies will result in phenotypic simi- 
larity. 

The integration of multiple ontologies - in particular 
from different domains - can reveal relations between 
annotated data items. For example, anatomy ontologies 
for cross-species comparisons - linking homologous or 
analogous anatomical structures - can be used to trans- 
fer and compare annotations for multiple species [15,46]. 
For this purpose, the UBERON anatomy ontology [15] was 
developed. It enables cross-species phenotype representa- 
tions that have been applied to deciphering human G WAS 
data based on comparisons with mouse model pheno- 
types [47] as well as the prioritization of candidate genes 
and drug targets based on data from model organisms 
[42,44,48,49]. 

Additionally, the rich axiom systems of some ontolo- 
gies help to verify and classify data according to con- 
straints on biological entities expressed in the ontologies. 
One example of such an application has been the clas- 
sification of proteins using ontologies [50], in which an 
ontology provides rules according to which decisions 
about the protein family are made. The same, or sim- 
ilar, constraints expressed in ontologies can be used to 
verify data, i.e., determine whether a data item com- 
plies with the constraints expressed in the ontology or 
not [51]. 

The main challenges for research in biomedical 
ontologies 

Evaluation of ontologies and the development of a robust 
research methodology 

Establishing effective methods to evaluate ontologies - 
both qualitatively and quantitatively, if possible - towards 
fitness for a purpose is a major challenge in ontology 



research [52]. Determining the "best" ontology for a given 
purpose becomes important, and criteria such as the 
ontology structure, formality, its complexity, its coverage, 
as well as the amount of data annotated with it con- 
tribute to this decision. Effective methods for evaluation 
are particularly required for domains in which multiple 
ontologies overlap in their content and intended appli- 
cations, such as for human diseases where ICD, MeSH, 
SNOMED CT, the Human Disease Ontology [53], the 
Human Phenotype Ontology [22], the Unified Medical 
Language System (UMLS) [54], and more specific ontolo- 
gies such as the Infectious Disease Ontology [55], Malaria 
ontology [56], etc. are being used. 

The research methodology underlying the development 
of biomedical ontologies will also improve when effective 
evaluation criteria are being applied. The Ontology Sum- 
mit [57] has addressed this need with the topic "Ontology 
Evaluation Across the Ontology Lifecycle" in 2013, and 
ontology evaluation featured prominently in panel dis- 
cussions at the International Conference on Biomedical 
Ontologies 2013 (ICBO) and will play a prominent role 
at ICBO2014. The JBMS thematic series on Biomedi- 
cal Ontologies will follow the community discussions to 
address ontology evaluation principles and methods, and 
their instantiation in community-agreed guidelines and 
standards. 

Standards and Interoperability: Linked Data and beyond 

Efficient reuse of ontologies, and the knowledge they con- 
tain, in the organization of open, linked data possibly 
accessible through multiple public interfaces (SPARQL 
endpoints) from different data providers is another chal- 
lenge [58]. The main task is to balance the complexity 
of processing and querying ontologies, which commonly 
require the use of an automated reasoner, with the need 
to efficiently query large, linked datasets. In particular 
when multiple ontologies are used to annotate datasets 
and automated reasoning over these ontologies provides 
the means for finding relations between the classes in 
these ontologies, the need for an infrastructure to support 
combined queries over ontologies with queries over linked 
data using SPARQL arises. 

Recently, some applications have come forward in which 
automated reasoning is used to answer complex queries 
over ontologies and subsequently retrieve data [59-61]. 
At the same time, major providers of biological and 
biomedical data such as the European Bioinformatics 
Institute (https://www.ebi.ac.uk/rdf/) and UniProt (http:// 
beta.sparql.uniprot.org/) provide access to their content 
through public SPARQL endpoints. In the future, we 
expect exciting applications that combine reasoning over 
ontologies in ontology repositories, such as the Ontol- 
ogy Lookup Service [62], BioPortal [63] or OntoBee [64], 
with (federated) SPARQL queries and provide a genuinely 
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knowledge-driven way for exploring linked biomedical 
data. 

Knowledge-based analysis of biomedical data 

Integration of ontologies - and the knowledge they 
contain - in the analysis of biological and biomedical 
data is yet another challenge. Ontologies have been suc- 
cessfully integrated with biomedical analysis pipelines 
[35,39,65,66]. However, these analysis methods mainly 
exploit the ontologies' taxonomy and often make use of 
the axioms and constraints only implicitly. 

Many ontologies contain a lot more information than 
taxonomic relationships, and some recent work has begun 
to exploit some additional information - disjointness 
between classes in an ontology - to improve computa- 
tion of semantic similarity [67] . How the rich information 
that is further contained in formalized ontologies can be 
incorporated in the analysis of biomedical data remains 
a research question, and novel methods will likely appear 
as the infrastructure and tool support around ontologies 
evolves. 

The JBMS thematic series on "Biomedical 
ontologies" 

The JBMS thematic series on Biomedical Ontologies will 
provide the venue for publishing research about biomedi- 
cal ontologies, their development, integration and quality 
assurance. On a regular basis, we will have open calls for 
papers on specific topics, and we welcome community 
input for important challenges to address. 

The annual JBMS Ontology Issue will become a central 
part of the thematic series where we focus on ontologies 
that have already been demonstrated to be useful for sci- 
entific applications. The Ontology Issue is intended for a 
wide audience of readers; it does not specifically target 
researchers in ontology, but rather biological and biomed- 
ical researchers who may want to apply ontologies in their 
domain and require an overview over the currently avail- 
able artifacts they can already use. In the Ontology Issue, 
new ontologies can be described as well as updates to 
existing ontologies. Updates in regular intervals produce 
a better understanding of the progress in developing an 
ontology and the major changes to its content, structure 
and applications. 

In the future, we aim to establish another regular call 
in which ontology-based tools and applications will be 
described and updates to these tools published. Addition- 
ally, the thematic series will provide a venue to publish 
conference and workshop papers, and interested work- 
ing groups are encouraged to suggest special topics or to 
contribute to existing publication cycles. 

We aim to make the JBMS thematic series on Biomed- 
ical Ontologies take a central role in the exploration of 
current research in biomedical ontologies, and we intend 



to work closely with the research community to achieve 
this aim. All researchers are invited to express ideas and 
demands, ask for feedback on topics, and provide sugges- 
tions for novel developments. 
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